Preparation and Analysis of Linguistic Corpora

ثبت نشده
چکیده

The corpus is a fundamental tool for any type of research on language. The availability of computers in the 1950’s immediately led to the creation of corpora in electronic form that could be searched automatically for a variety of language features and compute frequency, distributional characteristics, and other descriptive statistics. Corpora of literary works were compiled to enable stylistic analyses and authorship studies, and corpora representing general language use became widely used in the field of lexicography. In this era, the creation of an electronic corpus required entering the material by hand, and the storage capacity and speed of computers available at the time put limits on how much data could realistically be analyzed at any one time. Without the Internet to foster data sharing, corpora were typically created, and processed at a single location. Two notable exceptions are the Brown Corpus of American English (Francis and Kucera, 1967) and the London/Oslo/Bergen (LOB) corpus of British English (Johanssen et al., 1978); both of these corpora, each containing one millions words of data tagged for part of speech, were compiled in the 1960’s using a representative sample of texts produced in the year 1961. For several years, the Brown and LOB were the only widely available computer-readable corpora of general language, and therefore provided the data for numerous language studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Cross-linguistic and Cross-cultural Study of Epistemic Modality Markers in Linguistics Research Articles

Epistemic modality devices are believed to be one of the prominent characteristics of research articles as the commonly used genre among the academic community members. Considering the importance of such devices in producing and comprehending scientific discourse, this study aimed to cross–culturally and cross-linguistically investigate epistemic modality markers as an important subcategory...

متن کامل

Genre Analysis of ELT and Nursing Academic Written Discourse through Introduction

Since Swales’ (1981, 1990) CARS model work on the move structure of research articles, studies on genre analysis have been carried out amongst which works on different parts of research articles in various disciplines has gained a considerable literature. This study aims to investigate the rhetorical structure of the Introduction sections of articles in two fields of English Language Teaching (...

متن کامل

A Comparative Analysis of Lexical Bundles in Journalistic Writing in English and Persian: A Contrastive Linguistic Perspective

  This paper investigates the use of ‘lexical bundles’ in two broad corpora of journalistic writing. The aim of this study is to compare the use of lexical bundles in the two domains, one consisted of newspaper articles written in English and published in England and the other one comprised of newspaper articles written in Persian from Iranian publications. For this purpose, the frequency...

متن کامل

A Comparative Analysis of Lexical Bundles in Journalistic Writing in English and Persian: A Contrastive Linguistic Perspective

  This paper investigates the use of ‘lexical bundles’ in two broad corpora of journalistic writing. The aim of this study is to compare the use of lexical bundles in the two domains, one consisted of newspaper articles written in English and published in England and the other one comprised of newspaper articles written in Persian from Iranian publications. For this purpose, the frequency...

متن کامل

The REX corpora: A collection of multimodal corpora of referring expressions in collaborative problem solving dialogues

This paper describes a collection of multimodal corpora of referring expressions, the REX corpora. The corpora have two notable features, namely (1) they include time-aligned extra-linguistic information such as participant actions and eye-gaze on top of linguistic information, (2) dialogues were collected with various configurations in terms of the puzzle type, hinting and language. After desc...

متن کامل

A Comparative Analysis of Metadiscourse Markers in the Result and Discussion Sections of Literature and Engineering Research Papers

This study compares metadiscourse markers in result and discussion sections of literature and engineering research papers. To this end, 40 research articles (20 literature and 20 engineering) are selected from two major international journals. Based on Hyland’s (2005) model of metadiscourse, the articles are codified in terms of frequency, percentage, and density of interactive and interactiona...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005